An Interactive Classification of Web Documents by Self-Organizing Maps and Search Engines

نویسندگان

  • Kenji Hatano
  • Ryouichi Sano
  • Yiwei Duan
  • Katsumi Tanaka
چکیده

In this paper, we propose an effective classification view mechanism for hypertext data such as web documents based on Kohonen’s Self-Organizing Map (SOM) and search engines. Web documents collected by search engines are automatically classified by SOM and the obtained SOMs are incrementally modified according to the interaction between users and SOMs. At present, various search engines are designed to retrieve web documents. When we use search engines to retrieve web documents, we get a lot of answers as ever before, so we have a lot of labors to examine each web document. Therefore, in order to make up for search engines, we need a function to classify web document corresponding to the user’s point of view and their purposes. Furthermore, we cannot retrieve pertinent web documents by conventional search engines when a specific topic is described by more than one web document. To solve these problems, we exploited a contentbased clustering system for web documents. In this system, web documents are automatically clustered by their feature vectors produced from web documents or minimal subgraphs consisting of multiple web documents, and their overview maps are dynamically generated by SOM. Furthermore, we propose a method by which an obtained SOM is modified by user’s interaction such as feedback operations. It is important to reflect the aim of classification and the purpose of retrieval to this system. In our research, we intend to solve these problems by providing a view mechanism in which the Basic Units for retrieval and clustering of Web Documents (BUWDs) are changeable by users and relevance feedback operations enable the generation of an overview map which reflects user needs.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

SOMSE: A Neural Network Based Approach to Web Search Optimization

Conventional Web search engines return long lists of ranked documents that users are forced to sift through to find relevant documents. The notoriously low precision of Web search engines coupled with the ranked list presentation make it hard for users to find the information they are looking for. One of the fundamental issues of information retrieval is searching for compromises between precis...

متن کامل

Comparison of Two “Document Similarity Search Engines”

We have developed and used the “CDS document map” based on neural networks (Kohonen maps) http://simbad.u-strasbg.fr/A+A/map.pl In this self-organizing map, documents are gradually clustered by subject themes. The tool is based on keywords associated with the documents. For one selected document, we locate it on the CDS document map and retrieve articles clustered in the same area. The second s...

متن کامل

The Organization of Internet Web Pages Using Wordnet and Self-organizing Maps

iv ACKNOWLEDGMENTS I wish to thank my thesis adviser, Dr. Diane Cook, for her support and guidance. Dr. Cook consistently provided positive feedback and helped me to stay on track. With the Internet increasing in size at a rapid rate, locating information is becoming more difficult. Many people use traditional search engines, such as Altavista, to locate information, but they find that these se...

متن کامل

Exploration of Full-text Databases with Self-organizing Maps

Availability of large full-text document collections in electronic form has created a need for intelligent information retrieval techniques. Especially the expanding World Wide Web presupposes methods for systematic exploration of miscellaneous document collections. In this paper we introduce a new method, the WEBSOM, for this task. Self-Organizing Maps (SOMs) are used to represent documents on...

متن کامل

An Ensemble Click Model for Web Document Ranking

Annually, web search engine providers spend more and more money on documents ranking in search engines result pages (SERP). Click models provide advantageous information for ranking documents in SERPs through modeling interactions among users and search engines. Here, three modules are employed to create a hybrid click model; the first module is a PGM-based click model, the second module in a d...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999